Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supported Stages/2024 and added a basic CI test #27

Merged
merged 85 commits into from
Dec 5, 2024
Merged

Conversation

kvrigor
Copy link
Member

@kvrigor kvrigor commented Oct 9, 2024

* work-in-progress *

@jjokella
Copy link
Contributor

I add a small PDAF-related fix concerning the content of CMAKE_CXX_COMPILER_ID that is changed from Intel (Stages/2023) to IntelLLVM (Stages/2024).

FindNetCDF:
- Split NetCDF_ROOT into NetCDF_F90_ROOT and NetCDF_C_ROOT. C and Fortran NetCDF
  root directories can differ; moreover there could be multiple NetCDF C
  implementations installed on a system, e.g. serial NetCDF and parallel NetCDF.
- In particular, ParFlow requires parallel NetCDF header
  (`netcdf_par.h`)

CI:
- Forced cache component model repos
- Specified full path to dependencies and component models
BuildParFlow.cmake
- NETCDF_INCLUDE_DIR and NETCDF_LIBRARY aren't properly set even NETCDF_DIR is supplied. So we set them explicitly.
- Reordered options for readability

BuildOASIS3MCT.cmake
- Forgot to change NetCDF_ROOT to NetCDF_F90_ROOT
- `-ffree-line-length-none -ffixed-line-length-none` required for F90
  source files
- `-ffree-form` must be disabled to correctly compile F77 (`.F`) source
  files
Parallel build jumbles the build log.
Copy link
Contributor

@mvhulten mvhulten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not reviewed changes in submodules parflow and pdaf.
For the rest, it looks fine.

@mvhulten
Copy link
Contributor

I am still compiling. See if I can do that parallel (-j 8) with the new build_tsmp2.sh script…

Cheatsheet:
```sh
cat -e cmake/FindNetCDF.cmake

sed -i -e '$a\' cmake/FindNetCDF.cmake
```

Sources:

https://www.shellhacks.com/find-out-text-file-line-endings-lf-or-clrf/
https://unix.stackexchange.com/a/31955
@jjokella
Copy link
Contributor

jjokella commented Nov 28, 2024

Hey everyone, I am currently checking my PDAF-related builds / testcases with stages-2024. Some fixes weil be coming from #43

These include:

  • CLM3.5 bugfixes
  • ParFlow update of PDAF-patched to v3.13.0 and bugfixes

Bugfixes are usually related to IntelLLVM compilers ifx and icx being more pedantic than their predecessors.

@jjokella
Copy link
Contributor

Update: I encountered more errors than expected in #43

I hope to be done by it on Monday. If there is any time-pressure, feel free to merge already and I will merge #43 later into master.

@kvrigor
Copy link
Member Author

kvrigor commented Nov 30, 2024

Take your time @jjokella ! Making PDAF models work is within the scope of this PR; also it's best that Stages/2024 fully supports all component models. Let's have PDAF support as a final milestone before merging this PR.

Updates and bugfixes in the component models
- ParFlow: Update to version `v3.13.0` and add PDAF-fixes producing new tag  `v3.13.0-pdaf` (https://github.com/HPSCTerrSys/parflow/releases/tag/v3.13.0-pdaf)
- CLM3.5:  Bugfix and new tag `tsmp-patches-v0.3` (https://github.com/HPSCTerrSys/CLM3.5/releases/tag/tsmp-patches-v0.3)
- PDAF: Update with general developments.
@mvhulten
Copy link
Contributor

mvhulten commented Dec 2, 2024

I am having difficulties loading the right environment.

In build_tsmp2.shStages/2024 should be loaded if tsmp2_env is not set. However, in all I tried, stuff from Stages/2023 is used, which may be the reason for errors like

ld: /p/software/fs/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/lib/libmpi.so: undefined reference to `cudaGetDeviceCount@libcudart.so.11.0'

@jjokella
Copy link
Contributor

jjokella commented Dec 2, 2024

@mvhulten I can confirm that the libcudart error comes from loading the wrong stage. I have seen a runtime error related to libcudart, when I accidentally run a Stages/2024-executable in Stages/2023-environment.

As I understand, you see these errors, when compling with Stages/2023 on stages-2024 branch!? It should be a goal to still support Stages/2023-builds, so these issues may be worth a PR into this PR like I opened one for PDAF-builds.

What is still needed is probably a disentangling of libraries / compiler options for Intel (Stages/2023) and IntelLLVM (Stages/2024).

@mvhulten
Copy link
Contributor

mvhulten commented Dec 2, 2024

No, I really intend and try to compile with Stages/2024. It fails for me on jureca when I follow the quick start as shown below. It goes fine when I follow the advanced guide.

vanhulten1@jrlogin09:.../vanhulten1/models/TSMP2$ ./build_tsmp2.sh --eCLM
set model-id and component string
set component source dir
submodule models/eCLM aleady exists. Do you want overwrite it? (y/n) y
Overwrite submodule models/eCLM
Submodule path 'models/eCLM': checked out 'c9f334838b4584e8543dc3e6f7e50661433b2055'
set CMAKE options
source environment

Currently Loaded Modules:
  1) Stages/2024      (S)  15) libpciaccess/.0.17     (H)  29) Tcl/8.6.13                  43) netCDF-Fortran/4.6.1         57) libtirpc/.1.3.3   (H)
  2) GCCcore/.12.3.0  (H)  16) hwloc/2.9.1            (g)  30) Blosc/.1.21.5          (H)  44) PnetCDF/1.12.3               58) PCRE/.8.45        (H)
  3) zlib/.1.2.13     (H)  17) PMIx/4.2.6                  31) bzip2/.1.0.8           (H)  45) cURL/8.0.1                   59) util-linux/.2.39  (H)
  4) binutils/.2.40   (H)  18) MPI-settings/UCX            32) gzip/.1.12             (H)  46) Szip/.2.1.1             (H)  60) libdap/.3.20.11   (H)
  5) Intel/2023.2.1        19) ParaStationMPI/5.9.2-1 (g)  33) lz4/.1.9.4             (H)  47) ncurses/.6.4            (H)  61) GSL/2.7
  6) numactl/2.0.16        20) BLIS/0.9.0                  34) zstd/.1.5.5            (H)  48) libreadline/.8.2        (H)  62) netCDF-C++4/4.3.1
  7) CUDA/12          (g)  21) OpenBLAS/0.3.23             35) JasPer/.4.0.0          (H)  49) SQLite/.3.42.0          (H)  63) libarchive/3.6.2
  8) UCX-settings/RC       22) imkl/2023.2.0               36) NASM/.2.16.01          (H)  50) libffi/.3.4.4           (H)  64) ESMF/8.5.0
  9) UCX/default      (g)  23) FlexiBLAS/3.3.1             37) libjpeg-turbo/.2.1.5.1 (H)  51) Python/3.11.3                65) NCO/5.1.8
 10) pscom/.5-default (H)  24) FFTW/3.3.10                 38) libpng/.1.6.39         (H)  52) imkl-FFTW/2023.2.0           66) CMake/3.26.3
 11) XZ/.5.4.2        (H)  25) FFTW.MPI/3.3.10             39) libaec/1.0.6                53) expat/.2.5.0            (H)  67) gettext/.0.21.1   (H)
 12) libxml2/.2.11.4  (H)  26) ScaLAPACK/2.2.0-fb          40) ecCodes/2.31.0              54) UDUNITS/.2.2.28         (H)  68) Perl/5.36.1
 13) OpenSSL/1.1           27) Hypre/2.31.0-cpu            41) HDF5/1.14.2                 55) Java/11 -> Java/11.0.16      69) git/2.41.0-nodocs
 14) libevent/.2.1.12 (H)  28) Silo/4.11.1                 42) netCDF/4.9.2                56) ANTLR/.2.7.7-Java-11    (H)

  Where:
   S:  Module is Sticky, requires --force to unload or purge
   g:  Built with GPU support
   H:             Hidden Module

 

====================
== TSMP2 settings ==
====================
MODEL_ID: eCLM
TSMP2_DIR: /p/project1/cdetect/vanhulten1/models/TSMP2
TSMP2_ENV: /p/project1/cdetect/vanhulten1/models/TSMP2/env/jsc.2024_Intel.sh
BUILD_DIR: /p/project1/cdetect/vanhulten1/models/TSMP2/bld/JURECADC_eCLM
INSTALL_DIR: /p/project1/cdetect/vanhulten1/models/TSMP2/bin/JURECADC_eCLM
CMAKE command:
cmake -S /p/project1/cdetect/vanhulten1/models/TSMP2 -B /p/project1/cdetect/vanhulten1/models/TSMP2/bld/JURECADC_eCLM    -DeCLM=ON    -DCMAKE_INSTALL_PREFIX=/p/project1/cdetect/vanhulten1/models/TSMP2/bin/JURECADC_eCLM  |& tee /p/project1/cdetect/vanhulten1/models/TSMP2/bld/eCLM_2024-12-02_15-50.log 
== CMAKE GENERATE PROJECT start
-- The C compiler identification is Intel 2021.10.0.20230609
-- The CXX compiler identification is Intel 2021.10.0.20230609
-- The Fortran compiler identification is Intel 2021.10.0.20230609
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /p/software/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/bin/mpicc
-- Check for working C compiler: /p/software/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/bin/mpicc - broken
CMake Error at /p/software/fs/jurecadc/stages/2024/software/CMake/3.26.3-GCCcore-12.3.0/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "/p/software/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/bin/mpicc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /p/project1/cdetect/vanhulten1/models/TSMP2/bld/JURECADC_eCLM/CMakeFiles/CMakeScratch/TryCompile-09oGk3
    
    Run Build Command(s):/p/software/fs/jurecadc/stages/2024/software/CMake/3.26.3-GCCcore-12.3.0/bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_ef9a1/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_ef9a1.dir/build.make CMakeFiles/cmTC_ef9a1.dir/build
    gmake[1]: Entering directory '/p/project1/cdetect/vanhulten1/models/TSMP2/bld/JURECADC_eCLM/CMakeFiles/CMakeScratch/TryCompile-09oGk3'
    Building C object CMakeFiles/cmTC_ef9a1.dir/testCCompiler.c.o
    /p/software/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/bin/mpicc    -MD -MT CMakeFiles/cmTC_ef9a1.dir/testCCompiler.c.o -MF CMakeFiles/cmTC_ef9a1.dir/testCCompiler.c.o.d -o CMakeFiles/cmTC_ef9a1.dir/testCCompiler.c.o -c /p/project1/cdetect/vanhulten1/models/TSMP2/bld/JURECADC_eCLM/CMakeFiles/CMakeScratch/TryCompile-09oGk3/testCCompiler.c
    icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
    Linking C executable cmTC_ef9a1
    /p/software/fs/jurecadc/stages/2024/software/CMake/3.26.3-GCCcore-12.3.0/bin/cmake -E cmake_link_script CMakeFiles/cmTC_ef9a1.dir/link.txt --verbose=1
    /p/software/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/bin/mpicc CMakeFiles/cmTC_ef9a1.dir/testCCompiler.c.o -o cmTC_ef9a1 
    icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
    ld: warning: libcudart.so.11.0, needed by /p/software/fs/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/lib/libmpi.so, not found (try using -rpath or -rpath-link)
    ld: /p/software/fs/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/lib/libmpi.so: undefined reference to `cudaGetDeviceCount@libcudart.so.11.0'
    ld: /p/software/fs/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/lib/libmpi.so: undefined reference to `cudaGetErrorString@libcudart.so.11.0'

    ld: /p/software/fs/jurecadc/stages/2023/software/psmpi/5.7.0-1-intel-compilers-2022.1.0/lib/libmpi.so: undefined reference to `__cudaRegisterFatBinaryEnd@libcudart.so.11.0'
    gmake[1]: *** [CMakeFiles/cmTC_ef9a1.dir/build.make:100: cmTC_ef9a1] Error 1
    gmake[1]: Leaving directory '/p/project1/cdetect/vanhulten1/models/TSMP2/bld/JURECADC_eCLM/CMakeFiles/CMakeScratch/TryCompile-09oGk3'
    gmake: *** [Makefile:127: cmTC_ef9a1/fast] Error 2
    

@jjokella
Copy link
Contributor

jjokella commented Dec 2, 2024

Hm, I cannot reproduce. For me

git clone https://github.com/HPSCTerrSys/TSMP2.git
pushd TSMP2
git co stages-2024
./build_tsmp2.sh --eCLM

runs through. The first notable difference is that the IntelLLVM compilers are loaded:

== CMAKE GENERATE PROJECT start
-- The C compiler identification is IntelLLVM 2023.2.0
-- The CXX compiler identification is IntelLLVM 2023.2.0
-- The Fortran compiler identification is IntelLLVM 2023.2.0
-- Detecting C compiler ABI info

Maybe a completely clean start? Just seeing that you had to answer the overwrite question. Possibly Paul's snippet from above can be of service.

Copy link
Member

@s-poll s-poll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice update. I have not checked everything yet, however, wanted to give feedback first.

For future PR I would opt to make one PR for each feature. This makes it much easier to review. In this case it would be at least.

  • Update of Stages/2024
  • Update of component models
  • Integration of CI workflow
  • ...

I know that the borders are sometimes blurry, but IMO this make the process of reviewing faster, easier and less error-prone.

build_tsmp2.sh Outdated Show resolved Hide resolved
models/parflow Outdated Show resolved Hide resolved
env/jsc.2024_Intel.sh Outdated Show resolved Hide resolved
@kvrigor
Copy link
Member Author

kvrigor commented Dec 3, 2024

For future PR I would opt to make one PR for each feature. This makes it much easier to review. In this case it would be at least.

This is a good rule-of-thumb which I also try to observe when possible. As an exception, there are feature/s that are tightly-coupled that breaking them into multiple PRs would be awkward or counter-productive. This is also something that's not clear cut and needs to be decided case-by-case. For this PR, the underlying goal is to support GCC and IntelLLVM toolchains without breaking one or the other, so I think the featureset in this PR still makes sense.

@kvrigor kvrigor requested a review from s-poll December 4, 2024 06:19
Copy link

@DCaviedesV DCaviedesV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this PR as it currently is.

Copy link
Member

@s-poll s-poll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this PR. I have not tested this PR, but i think we could adapt later in case as this is already quite for a long time in pipeline.

@kvrigor
Copy link
Member Author

kvrigor commented Dec 4, 2024

Thanks everyone for all the comments and reviews! I plan to merge this PR tomorrow afternoon (5 Dec.) so there's still some time to do final checks.

@kvrigor kvrigor changed the title Stages/2024 + CI Supported Stages/2024 and added a basic CI test Dec 5, 2024
@kvrigor kvrigor merged commit 5b29084 into master Dec 5, 2024
1 check passed
@kvrigor kvrigor deleted the stages-2024 branch December 5, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants